Towards a generic test of the strong field dynamics of general relativity using compact 

binary coalescence 



T.G.F. Li\ W. Del Pozzo\ S. Vitale\ C. Van Den Broeck\ M. Agathos\ 
J. Veitchi'2, K. Grover^, T. Sidery^, R. Sturani^'S, A. Vecchio^ 
^ Nikhef - National Institute for Subatomic Physics, 
Science Park 105, 1098 XG Amsterdam, The Netherlands 
^School of Physics and Astronomy, Cardiff University, 
Queen's Buildings, The Parade, Cardiff CF24 3AA, United Kingdom 
^School of Physics and Astronomy, University of Birmingham, 
Edgbaston, Birmingham B15 2TT, United Kingdom 
* Dipartimento di Scienze di Base e Fondamenti, Universita di Urbino, 1-61029 Urbino, Italy 
"'INFN, Sezione di Firenze, 1-50019 Sesto Fiorentmo, Italy 
(Dated: March 6, 2012) 

Coalescences of binary neutron stars and/or black holes are amongst the most likely gravitational- 
wave signals to be observed in ground based interferometric detectors. Apart from the astrophysical 
importance of their detection, they will also provide us with our very first empirical access to the 
genuinely strong- field dynamics of General Relativity (GR). We present a new framework based on 
Bayesian model selection aimed at detecting deviations from GR, subject to the constraints of the 
Advanced Virgo and LIGO detectors. The method tests the consistency of coefficients appearing in 
the waveform with the predictions made by GR, without relying on any specific alternative theory 
of gravity. The framework is suitable for low signal-to-noise ratio events through the construction 
of multiple subtests, most of which involve only a limited number of coefficients. It also naturally 
allows for the combination of information from multiple sources to increase one's confidence in GR 
or a violation thereof. We expect it to be capable of finding a wide range of possible deviations 
from GR, including ones which in principle cannot be accommodated by the model waveforms, on 
condition that the induced change in phase at frequencies where the detectors are the most sensitive 
is comparable to the effect of a few percent change in one or more of the low-order post-Newtonian 
phase coefficients. In principle the framework can be used with any GR waveform approximant, 
with arbitrary parameterized deformations, to serve as model waveforms. In order to illustrate 
the workings of the method, we perform a range of numerical experiments in which simulated 
gravitational waves modeled in the restricted post-Newtonian, stationary phase approximation are 
added to Gaussian and stationary noise that follows the expected Advanced LIGO/ Virgo noise 
curves. 

PACS numbers: 04.80.Nn, 02.70.Uu, 02.70.Rr 



I. INTRODUCTION 



General Relativity (GR) is a non-linear, dynamical 
theory of gravity. Until the 1970s, all of its tests in- 
volved the weak-field, stationary regime; these are the 
standard Solar System tests that are discussed in most 
textbooks, e.g. P^j. GR passed them with impressive ac- 
curacy. Nevertheless, the more interesting part of any 
field theory resides in its dynamics, and this is especially 
true of GR [H [3] . A first test of the latter came from 
the Hulse- Taylor binary and a handful of similar tight 
neutron star binaries (IHS], whose orbital elements are 
changing in close agreement with GR under the assump- 
tion that energy and angular momentum are carried away 
by gravitational waves (GW). Thus, these discoveries led 
to the very first, albeit indirect, evidence for GW. How- 
ever, even the most relativistic of these binaries, PSR 
J0737-3039 [51 [S], is still in the relatively slowly vary- 
ing, weak-field regime from a GR point of view, with a 
compactness of GM/{c?R) ~ 4.4 x 10"'', with M the to- 
tal mass, R the orbital separation, and a typical orbital 
speed v/c ~ 2 x 10"^. By contrast, for an inspiraling 



compact binary, in the limit of a test particle around a 
non-spinning black hole, the last stable orbit occurs at 
a separation of i? = GGM/c^, where GM/{c^R) = 1/6 
and v/c = l/v^- This constitutes the genuinely strong- 
field, dynamical regime of General Relativity, which in 
the foreseeable future will only be empirically accessible 
by means of gravitational-wave detectors. 

Several large gravitational wave observatories have 
been operational for some years now: the two 4 km arm 
length LIGO interferometers in the US |7j, the 3 km arm 
length Virgo in Italy [HI H] , and the 600 m arm length 
GEO600 [inj. By around 2015, LIGO and Virgo wiU have 
been upgraded to their so-called advanced configurations 
[Tmi4| . and shortly afterwards up to tens of detections 
per year are expected [TS]. Another planned GW obser- 
vatory is the Japanese LCGT [16], and the construction 
of a further large interferometer in India is under con- 
sideration [17'. Among the most promising sources are 
the inspiral and merger of compact binaries composed of 
two neutron stars (BNS), a neutron star and a black hole 
(NSBH), or two black holes (BBH). 

Within GR, especially the inspiral part of the coales- 
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cence process has been modeled in great detail using the 
post-Newtonian (PN) formalism (see [18] and references 
therein), in which quantities such as the conserved en- 
ergy and flux are found as expansions in w/c, where v is 
a characteristic speed. During inspiral, the GW signals 
will carry a detailed imprint of the orbital motion. In- 
deed, the contribution at leading order in amplitude has 
a phase that is simply 2$(i), with being the orbital 
phase. Thus, the angular motion of the binary is directly 
encoded in the waveform's phase, and assuming quasi- 
circular inspiral, the radial motion follows from the in- 
stantaneous angular frequency uj{t) — <b{t) through the 
relativistic version of Kepler's Third Law. If there are 
deviations from GR, the different emission mechanism 
and/or differences in the orbital motion will be encoded 
in the phase of the signal waveform, allowing us to probe 
the strong-field dynamics of gravity. In this regard we 
note that with binary pulsars, one can only constrain 
the conservative sector of the dynamics to IPN order 
(i.e. (w/c)^ beyond leading order), and the dissipative 
sector to leading order; see, e.g., the discussion in [19] and 
references therein. Hence, when it comes to $(i), these 
observations do not fully constrain the IPN contribution. 
Yet several of the more interesting dynamical effects oc- 
cur starting from 1.5PN order; this includes 'tail effects' 
[201 HI] and spin-orbit interactions; spin-spin effects first 
appear at 2PN ^22j. As indicated by the Fisher matrix 
results of [53], Advanced LIGO/ Virgo should be able to 
put significant constraints on the 1.5PN contribution to 
the phase, and possibly also higher-order contributions. 

Possible deviations from GR that have been considered 
in the past in the context of compact binary coalescence 
include scalar-tensor theories |24r 29^; a varying Newton 
constant |30j ; modified dispersion relation theories, usu- 
ally referred to in literature as 'massive graviton' models 
[82] [Ml [291 l3Tti36] : violations of the No Hair Theorem 
[35'-13'; violations of Cosmic Censorship [l^J |35j; and 
parity violating theories [HHIZ]. The (rather few) spe- 
cific alternative theories of gravity that have been con- 
sidered in the context of ground-based gravitational wave 
detectors - essentially scalar-tensor and 'massive gravi- 
ton' theories - happen to be hard to constrain much 
further with GW observations, and we will not consider 
them in this paper. However, General Relativity may be 
violated in some other manner, including a way that is 
yet to be envisaged. This makes it imperative to develop 
methods that can search for generic deviations from GR. 

In the past several years, several proposals have been 
put forward to test GR using coalescing compact binary 
coalescence: 

1. One can search directly for the imprint of specific 
alternative theories, such as the so-called 'massive 
graviton' models and scalar-tensor theories [21H^ 
l5TtiMl [5S] . For the 'massive graviton' case, a full 
Bayesian analysis was recently performed by Del 
Pozzo, Veitch, and Vecchio j35j . 

2. A method due to Arun et al. exploits the fact that. 



at least for binaries where neither component has 
spin, all coefficients V'i in the PN expansion (see 
Eq. ([2]) below for their definition) of the inspiral 
phase depend only on the two component masses, 
mi and TO2 [331 Ell IHl- In that case only two of 
the ipi are independent, so that a comparison of 
any three of them allows for a test of GR. Such 
a method would be very general, in that one does 
not have to look for any particular way in which 
gravity might deviate from GR; instead it allows 
generic failures of GR to be searched for. However, 
so far its viability was only explored using Fisher 
matrix calculations. 

3. In the so-called parameterized post-Einsteinian 
(ppE) formalism of Yunes and Pretorius, gravita- 
tional waveforms are parameterized so as to include 
effects from a variety of alternative theories of grav- 
ity [Sni m] . A Bayesian analysis was performed in 

4. Recently a test of the No Hair Theorem was pre- 
sented, also in a Bayesian setting [53) . 

The first method presupposes that GR will be violated 
in a particular way. As for the third and fourth points, 
the particular Bayesian implementation that was used 
involves comparing the GR waveform with a waveform 
model that includes parameterized deformations away 
from GR, thus introducing further free parameters. 

Now, imagine one introduces free parameters pi, i = 
1,2,..., Nt in such a way that pi = for all i corresponds 
to GR being correct. Then one can compare a waveform 
model in which all the Pi are allowed to vary with the GR 
waveform model in which all the pi are zero. As we will 
explain in this paper, this amounts to asking the ques- 
tion: "Do all the pi differ from zero at the same time?" 
Let us call the associated hypothesis -ffi2...JVr, which is 
to be compared with the GR hypothesis 'Hgr- 

A more interesting (because much more general) ques- 
tion would be: "Do one or more of the pi differ from 
zero" , without specifying which. As we shall see, this 
question is more difficult to cast into the language of 
model selection. Below we will call the associated hy- 
pothesis 'HmodGR, to be compared with the GR hypoth- 
esis Hon- What we will show is that, although there is 
no single waveform model that corresponds to 'HmodGRj 
testing the latter amounts to testing 2^^ — 1 hypotheses 
-ffiii2...ifc corresponding to all subsets {pi^^p^^, ■ . ■ ,Pi^} 
of the full set {pi,p2, ■ . . ,PNt}- Each of the hypothe- 
ses Hi-^i^ is tested by a waveform model in which 
Pi-^,Pi^, . . . ,pi^, are free, but all the other pj are fixed to 
zero. The Bayes factors against GR for all of these tests 
can then be combined into a single odds ratio which com- 
pares the full hypothesis "HmodGR with "Hgr- 

In a scenario with low signal-to-noise ratio (SNR), as 
will be the case with advanced ground-based detectors, 
parameter estimation will degrade significantly when try- 
ing to estimate too many parameters at once, and so will 
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model selection if the alternative model to GR has too 
many additional degrees of freedom [35) . This could be 
problematic if one only tests the 'all-inclusive' hypothesis 
Hi2...Nt against GR, i.e., if the question one asks is "Do 
all the Pi differ from zero at the same time" . The ques- 
tion we want to ask instead, namely "Do one or more of 
the Pi differ from zero" , is not only more general; most 
of the sub-hypotheses Hi-^i^,,,i^ involve a smaller num- 
ber of free parameters, making the corresponding test 
more powerful in a low-SNR situation. And, as in most 
Bayesian frameworks, information from multiple sources 
can be combined in a straightforward way. 

Our framework can be used with any family of wave- 
forms with parameterized deformations, including the 
ppE family. In order to illustrate the method, follow- 
ing [221 HHl HH] we make the simplest choice by adopt- 
ing model waveforms in which the deviations away from 
GR take the form of shifts in a subset of the post- 
Newtonian inspiral phase coefficients. To establish the 
validity of the framework, we perform simulations with 
simple analytic frequency domain waveforms in station- 
ary, Gaussian noise that follows the expected Advanced 
LIGO/Virgo noise curves. In the future, for actual tests 
of GR, one may want to use time domain waveforms and 
introduce deviations in e.g. a Hamiltonian used to nu- 
merically evolve the motion of the binary. 

As noted in [351 110] and also illustrated here, if one 
is only interested in detection, then it might suffice to 
only search with template waveforms predicted by GR, 
but parameter estimation can be badly off; this is what 
is called 'fundamental bias'. Indeed, even if there is a 
deviation from GR in the signal, then a model waveform 
with completely different values for the masses and other 
parameters could still be a good fit to the data, with 
minimal loss of signal-to-noise ratio. We note that given 
model waveforms that feature a certain family of defor- 
mations away from GR, 'fundamental bias' can still occur 
if the signal has a deviation that does not belong to the 
particular class of deformations allowed for by the mod- 
els. However, in that case one expects the non-GR model 
waveforms to still be preferred over GR ones in the sense 
of model selection, even if parameter estimation may be 
deceptive in interpreting the nature of the deviation. We 
will show an explicit example of this, and will argue that 
generic deviations from GR can be picked up, of course 
subject to the limitations imposed by the detectors, as 
is the case with any kind of measurement. In particular, 
we expect that a GR violation will generally be visible, 
on condition that its effect on the phase at frequencies 
where the detector is the most sensitive is comparable to 
the effect of a few percent shift in one of the lower-order 
phase coefficients. 

This paper is structured as follows. We first introduce 
our waveform model for compact binary inspiral, and dis- 
cuss its sensitivity to changes in phase parameters (Sec- 
tion In]). In Section III we explain the basic method for 



single and multiple sources. In Section [TV] we construct 
different simulated catalogs of sources and evaluate the 



level at which deviations from GR can be found. We end 
with a summary and a discussion of future steps to be 
taken. 

Unless stated otherwise, we will take G = c = 1. 



II. WAVEFORM MODEL AND ITS 
SENSITIVITY TO CHANGES IN PHASE 
COEFFICIENTS 

We now introduce our waveform model. Since we 
are concerned with testing the strong-field dynamics of 
gravity, eventually all of the effects which we expect to 
see with compact binary coalescence should be repre- 
sented in the waveform. This includes, but is not lim- 
ited to, precession due to spin-orbit and spin-spin in- 
teractions [22], sub-dominant signal harmonics [54], and 
merger/ringdown |55| . In due time these will indeed need 
to be taken into account. However, in this paper we 
first and foremost wish to demonstrate the validity of 
a particular method, for which it will not be necessary 
to use very sophisticated waveforms. Also, as suggested 
by Fisher matrix calculations such as those of Mishra et 
al. [23], methods based on measuring phase coefficients 
will be the most accurate at low total mass. In this pa- 
per we limit ourselves to BNS sources, for which spin will 
be negligible, as well as sub-dominant signal harmonics 
[131 15^ . Since we will assume a network of Advanced 
LIGO and Virgo detectors, the merger and ringdown sig- 
nals will also not have a large impact [57]. Thus, for a 
first analysis we will focus on the inspiral part of the coa- 
lescence process, modeling the waveform in the frequency 
domain using the stationary phase approximation (SPA) 
[58, 59 . In particular, we use the so-called restricted 
TaylorF2 waveforms [BDJ [HI] up to 2PN in phase. 

Since the way we illustrate our method here is based 
on allowing for deviations in phase coefficients, we will 
need to know how sensitive our waveform model is to 
minor changes in the values of these coefficients. To get 
a sense of this, one could use the results of [53] as a 
guide, but since these are based on the Fisher matrix they 
necessarily assume that signal and template are from the 
same waveform family. Before explaining our method for 
testing GR, we will first look at what happens both to 
detectability and parameter estimation when the signal 
contains a deviation from GR but is being searched for 
with a bank of GR templates. 



A. Model waveform(s) and detector configuration 

We start from the way TaylorF2 is implemented in the 
LIGO Algorithms Library [BO]: 

^^^^ ^ ]_ A{e, (j),L,i},M,7j) ^2/3 ^^^(t^,<^^,M,V■J) ^ (1) 
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F{M,mf) 
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where D is the luminosity distance to the source, (0, 4>) 
specify the sky position, (t, V') give the orientation of 
the inspiral plane with respect to the line of sight, M 
is the chirp mass, and 77 is the symmetric mass ratio. 
In terms of the component masses (7711,7712), one has 
?/ = 77117^2/(7)71 + 7712)^ and M = {mi + 7712)77^/^. 
and <j)c are the time and phase at coalescence, respec- 
tively. The 'frequency sweep' F{A4,ri; /) is an expansion 
in powers of the frequency / with mass-dependent coef- 
ficients, and 

*(tc,0c,X,?7;/) 
= 27r/tc - 0c - 7r/4 



I 

J2 [V'^ + V'f^ln/ 



1=0 



/ 



(j-5)/3 



(2) 



In the case of GR, the functional dependence of the co- 
efficients tpi and V'f'' on {M,r]) can be found in [23] . 
However, here we will not assume that those relation- 
ships necessarily hold, except in the case of ipo, which 
has been tested using binary pulsars. With minor abuse 
of notation, let us re-label the remaining coefficients as 
1 = 1,.. . ,M._ 

We note that F is related to the phase ^E", and in prin- 
ciple we should also leave open the possibility that its ex- 
pansion coefficients deviate from their GR values. How- 
ever, with the Advanced LIGO and Virgo network and for 
stellar mass binaries we do not expect to be very sensitive 
to sub-dominant contributions to the amplitude [43|, I56j . 

Let us focus on the phase One way of testing GR 
would be to use a model waveform in which all the ipi are 
considered free parameters, measure these together with 
M and rj, and check whether one obtains agreement with 
the functional relations ipi {A4 , 7/) predicted by GR. How- 
ever, the events we expect in Advanced LIGO /Virgo will 
probably not have sufficient SNR for this to be directly 
feasible [23] . 

Below, we instead suggest a scheme where a large num- 
ber of tests are done, in each of which a specific, limited 
subset of the phase coefficients is left free while the oth- 
ers have the dependence on masses as in GR. The results 
from all of these tests can then be combined into the odds 
ratio for a general deviation from GR versus GR being 
correct. 

In this study we will assume a network of two Ad- 
vanced LIGO detectors, one in Hanford, WA, and the 
other one in Livingston, LA, together with the Advanced 
Virgo detector in Cascina, Italy. We take the Advanced 
LIGO noise curve to be the one with zero-detuning of the 
signal recycling mirror and high laser power :62J. With 
these assumptions, the curves in Fig. [l] represent the 
incoherent sums of the principal noise sources as they 
are currently understood; however, there may be un- 
expected, additional sources of noise. The high-power, 
zero-detuning option gives most of the desired sensitiv- 
ity with the fewest technical difficulties. Advanced Virgo 
can also be optimized for BNS sources by an appropri- 
ate choice of the signal recycling detuning and the signal 



recycling mirror transmittance [13) . and this is what we 
assume here. 




FIG. 1: The higli-power, zero-detuning noise curve for Ad- 
vanced LIGO, and the BNS-optimized Advanced Virgo noise 
curve. 



B. Changes in phase coefficients and detectability 

It is important to investigate to what extent signals 
that may deviate from General Relativity could be de- 
tected in the ffi'st place. The fitting factor (FF) is a 
measure of the adequateness of a template family to fit 
the signal; 1 — FF is the reduction in signal-to-noise ra- 
tio that occurs from using a model waveform which dif- 
fers from the exact signal waveform when searching the 
data. Let he{X) be the 'exact' waveform of the signal and 
hm{0) the model used for detection; the exact and model 
waveforms are dependent on sets of parameters A and ^, 
respectively. The fitting factor is then defined as [63] 



FF = max 



(/ie(A) I hrnie)) 



'{hm{0) I h^{e)){he{X) I /ic(A)), 



, (3) 



where (a | b) denotes the usual noise weighted inner prod- 
uct. 



{a\b) 



a*{f)bif)+aif)b*if) 
Snif) 



df , (4) 



and Sn{f) is the one-sided noise spectral density [19| . 
/mill is the detectors' lower cut-off frequency, and in 
our case, /max is the frequency at last stable orbit, 
f^^^ = (6^/'^TrMr]~^^^)~^. We note that the detection 
rate scales like the cube of the signal-to-noise ratio, so 
that the fractional reduction in event rate is 1 — FF^ 
[63[ . In this case the waveform is not 'exact' in the sense 
of numerical relativity; instead we use the fitting factor as 
a measure of how similar a modified TaylorF2 waveform 
and a GR version are. 
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A sample deviation is tested using a modified wave- 
form different only in the 1.5PN order phase coefRcient: 
ipf^{M,ri) -ipf^iM,!]) [1 + Sxs]- The Advanced 
LIGO noise curve is used. The signal is 'detected' with 
a template bank of standard GR waveforms that is reg- 
ularly spaced in the parameters present in the phase, (j)c, 
tc, Ai, and ?/. We study a (1.4, 1.4) Mq binary with Sxs 
ranging from 0.025 to 0.175. As seen in figures [2] and |3j 
the mass parameters can absorb the change in the phase 
due to the modified phase coefficient while providing a 
fitting factor of over 95%. 

Thus, with a template bank of GR waveforms it is pos- 
sible to detect a signal containing a large deviation from 
GR without significant loss in signal-to-noise ratio, but 
recovered with intrinsic parameters that deviate signifi- 
cantly from the true values. We now describe a method 
which will be able to nevertheless recognize a deviation 
from GR when one is present. 
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FIG. 2: The fitting factors for a range of A4, once the other 
parameters are maximized over. Here a deviation of tps from 
2.5% to 17.5% is used for a (1.4, 1.4) M© system. The ver- 
tical dashed line represents the true value of M, while the 
maximum is offset to compensate for the modification in the 
phase. 
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FIG. 3: The fitting factors for a range of rj, once the other 
parameters are maximized over. Here a deviation of tps from 
2.5% to 17.5% is used for a (1.4, 1.4) Mq system. The vertical 
dashed line represents the true value of rj, while the maximum 
is offset to compensate for the modification in the phase. 



|64) . which was first introduced into ground-based GW 
data analysis by Veitch and Vecchio [65H67] and which 
we will adopt here as well. 

Let us consider hypotheses Hi, Hj. Here Hi could be 
the hypothesis that there is a deviation from GR while Hj 
is the hypothesis that GR is correct; or Hi could simply 
be the hypothesis that a signal of a particular form is 
present in the data while Hj is the hypothesis that there 
is only noise. The statements that we can make about 
any hypothesis are based on a data set d (observations) 
and all the relevant prior information I that we hold. 

Within the framework of Bayesian inference, the key 
quantity that one needs to compute is the posterior prob- 
ability of a hypothesis Hi- Applying Bayes' theorem we 
obtain 



P{H^\d,l) = 



P{H,\l)P{d\H,,l) 
P{d\l) 



(5) 



III. METHOD 

We will first recall some basic facts about Bayesian 
inference. Next we outline our method for finding a pos- 
sible violation of GR using inspiral signals by allowing for 
deviations in phase coefficients. For simplicity, we start 
with an example where only two phase coefficients are 
taken into account; then we go on to the general case. 
Finally we explain how to combine information from a 
catalog of sources. 

A. Bayesian inference and nested sampling 

We now give a brief overview of Bayesian inference, as 
well as the method of nested sampling due to Skilling 



where P{Hi\d,V) is the posterior probability of the hy- 
pothesis Hi given the data, P{Hi\V) is the prior proba- 
bility of the hypothesis, and P{d\Hi,V) is the marginal 
likelihood or evidence for Hi, which can be written as: 

P{d\H,,l) = C{Hi) 

- j d0p{e\H,,I)pid\9,m,l). (6) 

In the above expression, p{9\Hi,l) is the prior probabil- 
ity density of the unknown parameter vector 9 within the 
model corresponding to Hi, and p{d\9,Hi,l) is the like- 
lihood function of the observation d, assuming a given 
value of the parameters 9 and the model Hi. 

If we want to compare different hypotheses. Hi and 
Hj, in light of the observations made, we can compute 
the ratio of posterior probabilities, which is known as the 
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odds ratio: 



B. Basic method for a single source 



O! = 



p{n,\d,i) 

p{d\n^,i) 
p{n,\i) p{d\H„i) 



(7) 



where P{T-Lj\V) / P{T-Li\V) is the prior odds of the two hy- 
potheses, the relative confidence we assign to the models 
before any observation, and is the Bayes factor. 

In addition to computing the relative probabilities of 
different models, one usually wants to make inference 
on the unknown parameters, and therefore one needs to 
compute the joint posterior probability density function 
(PDF) 



p{e\d,H^,l) = 



p{e\H^,l)p{d\e,H,,l) 
pid\H,,l) 



(8) 



From the previous expression it is simple to compute the 
marginalized PDF on any given parameter, say 9i within 
a given model Hi'. 



p{9i\d,m,l) 



d0. 



ddNp{0\d,H,,l). (9) 



The key quantities for Bayesian inference in Eq. ([t]) , ([s]) 
and ^ can be efficiently computed using e.g. a nested 
sampling algorithm [M]. The basic idea of nested sam- 
pling is to use a collection of n objects, called live points, 
randomly sampled from the prior distributions, but sub- 
ject to a constraint over the value of their likelihood. A 
live point ^ is a point in the multidimensional parame- 
ter space. At each iteration, the live point ^* having the 
lowest likelihood €{£,*) is replaced with a new point ^ 
sampled from the prior distribution. To be accepted, the 
new point must obey to the condition 



c{0>c{C). 



(10) 



The above condition ensures that regions of progressively 
increasing likelihood are explored, and the evidence in- 
tegral, Eq. (|6|, is calculated using those points as the 
computation progresses. 

In this paper we use a specific implementation of this 
technique that was developed for ground-based observa- 
tions of coalescing binaries by Veitch and Vecchio; we 
point the interested reader to for technical de- 

tails. To select a new live point, a Metropolis-Hastings 
Markov chain Monte Carlo (MCMC) is used with p steps. 
The uncertainty in the evidence computation for a given 
number of live points n and MCMC steps p was quan- 
tified in [67]. For the calculations in this paper we took 
n — 1000 and p = 100, in which case the standard devi- 
ation on log Bayes factors is 0(1). 



Given that we have no knowledge of which coeffi- 
cient(s) might deviate from the OR values, we want 
to test the hypothesis that at least one of the known 
coefficients {^Oj ^i, "02, • ■ • , V'm} is different. Compu- 
tational limitations and possible lack of sensitivity to 
changes in the higher-order coefficients will induce us to 
only look for deviations in a set of testing coefficients 
{V'l, ■ ■ • , V'Afr} C {■0o,'0i,---,V'm}, with Nt < M. For 
the examples of Sec. |IV[ we will choose Nt = 3 due to 
computational constraints, but a larger number could be 
used. 

Let us introduce some notation. We define hypotheses 
Hiii2...ik as follows: 

Hiii2---ik the hypothesis that the phasing 
coefficients V'li , • ■ • , do not have the func- 
tional dependence on [AA , rj) as predicted by 
General Relativity, but all other coefficients 
V'ji j 4- {*i7 *2, • ■ • , ifc} Ao have the depen- 
dence as in GR. 

Thus, for example, iJi2 is the hypothesis that and ■(/'2 
deviate from their GR values, with all other coefficients 
being as in GR. With each of the hypotheses above, we 
can associate a waveform model that can be used to test 
it. Let Q = {A^,r/, ...} be the parameters occurring in 



the GR waveform. Then Hi, 



is tested by a waveform 



in which the independent parameters are 



{6', V'll , 



(11) 



i. e. the coefficients { V'li i "012 • ■ ■ i '^ik } ^-re allowed to vary 
freely. In practice, these will need to be subsets of a 
limited set of coefficients; in Section|TVj where we present 
results, we will consider all subsets of the set {V'l, '02, "03}- 
This choice will already suffice to illustrate the method 
without being overly computationally costly, but in the 
future one may want to use a larger set. 

Now, the hypothesis we would really like to test is that 
one or more of the i/ji differ from GR, without specifying 
which. This corresponds to a logical 'or' of the above 
hypotheses: 



modGR 



V 



Pli, 



l\<l1<...<lh 



Our aim is to compute the following odds ratio: 

nmodGR _ ^(^modGRM,I) 



■'GR 



P{-HGR\d,l) 



(12) 



(13) 



However, the hypothesis ( 12 ) is not what model wave- 
forms with one or more free coefficients V'i will test; 
rather, such waveforms test the hypotheses Hi-^i^..^^ 
themselves. What we will need to do is to break up the 
logical 'or' in "HmodGR into the component hypotheses 



. . Fortunately, this is trivial. 
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Before treating the problem more generally, let us con- 
sider a simple example. Imagine that only two coefh- 
cients tpi and V'2 are being used for the testing of GR. 
Then 



^modGR — i?l V i?2 V H12. 



(14) 



In this example, the odds ratio of interest can then be 
written as 

(2)^modGR _ PjHiV H,2\d,l) 

- P{non\d,I) ' ^'^^ 
where the superscript (2) reminds us that only two of the 



parameters are being used for testing. 



An important observation is that the hypotheses Hi, 
H2, H12 are logically disjoint: the 'and' of any two of 
them is always false. Indeed, in Hi, ■02 takes the value 
predicted by GR, but in H2 it differs from the GR value, 
as it does in H12. Similarly, in H2, "01 takes the GR value, 
but in Hi it differs from the GR value, and the same in 
Hi2. More generally, any two hypotheses H^-^i^ i^, and 
Hjd2-]i with {2i,i2,...,z/c} ^ {ji,j2,---,ji} are logi- 
cally disjoint. This means that the odds ratio is simply 



(2)QmodGR 



P(ffl|d,I) , PiH2\d.l) , P{Hi2\d,l) 



P(HgrM,I) F(HgrM,I) F(HgrM,I) 



(16) 



Using Bayes' theorem, this can be written as 



(2)QmodGR 



'GR 



^(HgrII) 



^GR 



^(HgrIi; 



^GR 



P{Hi2\I) 

P(Hgr|I) 



^GR- 



(17) 



r 



Here Bq^^, Bqj^, Bq^ are the Bayes factors 



^GR 



^GR - 



r12 

^GR 



P{d\Hi,l) 

p{d\nGR,i)' 

P{d\H2,l) 

P{d\nGR,l)' 

P{d\Hi2,l) 

P{d\HGR,l)' 



with ipf^^{A4,ri) the functional form of the dependence 
of tpi on {M,r]) according to GR, and the dimensionless 
6xi is a fractional shift in -0^. Note that in GR, the 
0.5PN contribution is identically zero; it will be treated 
separately, as explained in section |IV] In the example of 
this subsection, one can assume that '01 j 02 are any of 
the PN coefficients other than the 0.5PN one. With the 



(^Ig'j above notation, the Bayes factors (18) are 



and P(7?i|I)/P(Hgr|I), P{H2\l)/P{nGR\l), 
P{Hi2\T)/P{'Hgr\'^) are ratios of prior odds. 

In practice, we will write the testing coefficients as 



(19) 



^GR 



^GR - 



^GR 



/ dOdSxi ^^^njdxi) 7r{9) p{d\9, Sxi,Hi,l) 

Sde^{e)p{d\e,HGR,i) 

J d9d5x2 ^^^7t(6x2) <e) P{d\e, 5x2, H2, 1) 

jdeTTie)pid\9,nGR,i) 

J d0dSxiddx2 ^'^^7r{Sxi,Sx2)TT{9)p{d\0,Sxi,Sx2,Hi2,l) 

J de^{9)p{d\e,HGR,i) 



(20) 
(21) 
(22) 
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Here ^^^7r((5xi), ^'^^'^{Sx2), ''^^-^'''(^xi, 5x2) are priors for, 
respectively, ^xij SX2, and the pair (i5xij ^X2)- We choose 
these to be constant functions in the relevant parame- 
ter or pair of parameters, having support within a large 
interval or square centered on the origin, and normal- 
ized to one. For the other parameters, 9, we use the 
same functional form and limits as [67], with the excep- 
tion of the distance being allowed to vary between 1 and 
1000 Mpc. Specifically, for the sky location and the ori- 
entation of the orbital plane we choose uniform priors on 
the corresponding unit spheres. For the phase at coa- 
lescence (j)c we choose a flat prior with (p^ G [0, 27r], and 
the time of coalescence tc is in a time interval of 100 ms. 
The prior on 77 is flat on the interval [0,0.25]. For chirp 
mass we use an approximation to the Jeffreys prior which 
gives p{M\T) oc A^~^^/^; see [ST] for motivation. In ad- 
dition, component masses are restricted to the interval 
mi, 7712 e [1,34] Mq. 

At this point it is worth commenting on the mutual 
relationships of the hypotheses Hi^i^ i^, amongst each 
other and with "HgRj and on the waveform models used 
to test them. As an example, let us discuss the case of Hi 
and "Hgr- Con sider the numerator of the Bayes factor 
fiijR in Eq. ([20|: 



Sddxi ^'K{Sxi)7Ti0)p{d\9,SxuHi,l). (23) 



The parameter space of the GR waveforms, {0}, has 
a natural embedding into the parameter space {9,Sxi} 
of the waveforms used to test Hi: it can be identified 
with the hypersurface Sxi = 0. We could have explic- 
itly excluded this hypersurface from {0,Sxi} by setting 
a prior on Sxi of the form ^^^tto{Sxi) = if ^xi — 
and ^'^^TTo{Sxi) — const otherwise. However, this would 
not have made a difference in the integral above; indeed, 
with respect to the integration measure induced by the 
prior probability density on {9, 5xi}, the surface Sxi — 
constitutes a set of measure zero anyway. Now look at 
the denominator in the expression for Sqj^, which is the 
evidence for the GR hypothesis: 



d9 7r{9)pid\9,HGn,l). 



(24) 



Despite the fact that the GR waveforms form a set of 
measure zero within the set of waveforms used for test- 
ing Hi, the above integral is clearly not zero. It is the 
evidence for a qualitatively different hypothesis, whose 
parameter space {9} carries a different integration mea- 
sure with respect to which the marginalization of the 
likelihood is carried out. In particular, "Hgr is not 'in- 
cluded' in the Hi model in any sense that is meaningful 
for model selection. 

Next let us consider H12 . The numerator of the Bayes 



factor in Eq. (22) is 



d9dSxi ddx2 ^''^7riSxiJx2) 7r{9)p{d\9, 6x1,6x2, H12, 1). 

(25) 



The parameter space of the GR waveforms has a natural 
embedding into the parameter space {9, 6x1, 6x2} of the 
waveforms used to test H12, by identification with the 
hypersurface 6x1 = 6x2 = 0. Similarly, the parameter 
spaces of the waveforms we use to test Hi and H2 can 
be identified with the hypersurface 6x2 = and the hy- 
persurface 6x1 = 0, respectively. With respect to the in- 
tegration measure induced by the prior on {6*, 6x1, 6x2}, 
these hypersurfaces have measure zero. Despite this, the 



numerators of Bq^ and -Bqr in Eqns. (201 and (21) are 



not zero, because of the different integration measures. 
With our choices of prior probability densities on the 
different parameter spaces, and the associated waveform 
models, there is no meaningful sense in which Hi or H2 
are 'included' in H12. Thus, we are indeed testing the 
disjoint hypotheses Hi, H2, and Hi2. The latter is the 
hypothesis that both 6x1 and 6x2 differ from zero: the 
prior density ^^^^7r(5xi, 1^X2) gives the surfaces 6x1 — 
and 6x2 = zero prior mass. 

The waveform model that leaves {9, 6x1, 6x2} free cor- 
responds to the hypothesis H12. When computing the 
Bayes factor B^j^ , the question one addresses is "Do tpi 
and ijj2 both differ from what GR predicts?" This is anal- 
ogous to what has been done in recent Bayesian work on 
testing GR [351 ISH [S3]- As we are in the process of 
showing, it is possible to address a more general ques- 
tion, namely "Do ^i, or ^2, or both at the same time, 
differ from their GR values?" 

For completeness, we note that what nested sam- 
pling will give us directly are not the Bayes factors of 
Eqns. (20)"(22), but rather the Bayes factors for the var- 



ious hypotheses against the noise-only hypothesis 



nl _ 



Bi 



B 



12 



B 



GR 



P{d\Hi,l) 

P{d\nnoiseJ)' 

P{d\H2,l) 

P((i|H„oisc: I) 

P{d\Hi2,l) 
P{d\'H^o\sc, I) 

p{d\nGii,i) 

P{d\Hnoisc,l)' 



(26) 



These can trivially be combined to obtain the Bayes fac- 
tors of Eqns. ([20]) -([22]): 



jl noise 



B 

Di " noise r>2 noise 

^GR - ^GR ' ^GR - ^GR > 



tj12 

d12 noise /o'yA 

-^GR-T^Gir- ^^') 

noise 



Now, upon calculating the Bayes factors for each model, 
we would like to combine these measurements into an 
overall odds ratio between the GR model and any of the 
competing hypotheses (Eq. (17)). In order to do this, we 



must specify the prior odds for each model against GR, 
P{Hi\l)/P{nGR\l), P{H2\l)lP{nGK\l), etc. Here one 
might want to let oneself be guided by, e.g., the expec- 
tation that a violation of GR will likely occur at higher 
post-Newtonian order, and give more weight to H2 and 
Hi2. Or, if one expects a deviation to happen only in 
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a particular phase coefficient (such as ip2 in the case of 
'massive gravity'), one may want to downweight the most 
inclusive hypothesis, in this example i?i2- In reality, we 
will not know beforehand what form a violation will take; 
in particular, it could affect all of the PN coefficients. For 
the purposes of this analysis, we invoke the principle of 
indifference among the alternative hypotheses, taking no 
one to be preferable to any other. This imposes the con- 
dition that the prior odds of each against GR are equal. 
We explicitly note that this is a choice of the authors 
for computing final results. This choice results in our ef- 
fectively taking the average of the Bayes factors for the 
alternative hypotheses when we compute the odds ratio 
versus GR. 

When combining the Bayes factors into the odds ratio, 
we therefore assume 



P{H,\1) _ P{H2\l) _ P{Hi2\I) 



P(Hgr|I) P(Hgr|I) P(Hgr|I)' 



Furthermore, we let 



Pin modGR|I) P(-ffi V iJ2 V -ffi2|I) 



PiUoRll) 



P(Hgr|I) 



a, 



(28) 



(29) 



where we do not specify a; it will end up being an overall 
scaling of the odds ratio. This, together with (28 1 and 
the logical disjointness of the hypotheses Hi, H2, H12 
implies 



PjHill) ^ P(g2|I) ^ P{Hi2\l) 

P{Hgr\1) -P(Hgr|I) PiHGRll) 



a 
3' 



(30) 



The final expression for the odds ratio for a modifi- 
cation of GR versus GR, in the case where up to two 
coefficients are used for testing, is then 



(2)QmodGR 



a 



■'GR 



Bgr 



Bgr 



Bgr] 



(31) 



Before continuing to the case of more than two testing 
parameters, let us compare and contrast what is proposed 
here with what was done in previous Bayesian work, e.g. 

[5H There, one introduced free parameters pi 

in the waveform (not necessarily corresponding to shifts 
in phase coefficients; they could, e.g., be shifts in ring- 
down frequencies and damping times), which are zero in 
GR. Next, one constructed a Bayes factor comparing a 
model waveform in which all of the pi are allowed to vary 
freely with the GR model in which all of the pi are fixed 
to zero. In our language and in the case of two testing 
parameters, this corresponds to only comparing the hy- 
pothesis H12 with the GR hypothesis Hgr- Hence, what 
was effectively done in previous work was to address the 
question: "Do all of the additional free parameters differ 
from zero at the same time?" A more interesting ques- 
tion to ask is "Do one or more of the extra parameters 
differ from zero?" This corresponds to testing our hy- 
pothesis "HmodGR- There is no waveform model that can 



be used to test the latter hypothesis directly, but as we 
were able to show, "HmodGR can be broken up into sub- 
hypotheses. Hi, H2, and H12 in the case of two testing 
parameters. With each of these, a waveform model can 
be associated, hence they can be tested against the GR 
hypothesis, Hgr- The resulting Bayes factors can be 
combined into an odds ratio as in Eq. (31 1, which does 
compare the more general hypothesis HmodGR with Hgr- 



C. The general case 

So far we have assumed just two testing coefficients, 
but we may want to use more. In practice it makes sense 
to pick {■01, . . . , iPnt}i < M . The number of coef- 
ficients used, Nt, will be dictated mostly by computa- 
tional cost; in Sec. |IV| we will pick Nt = 3 but a larger 
number could be chosen. We then define 



modGR 



V 



H, 



(32) 



ii<i2<---<ik\k<NT 



When using this set of testing coefficients, the odds ratio 
for 'modification to GR' versus GR becomes: 

(Nt) r)n\odGR 



GR 

P(H.nodGRM,I) 
P{nGR\d,l) 

i<i2<-..<ik;k<Ni 



H, 



Id, I) 



p{nGR\d,i) 



(33) 



where as before, Hi^i^,,,i^ is the hypothesis that 
{"i/^ij , ■!/'i2 1 ■ ■ ■ 1 '4'ik } do not have the functional dependence 
on {M , r]) as predicted by GR, but all of the remaining 
coefficients do. Thus, we are considering the odds ratio 
for one or more of the phase coefficients ipi, . . . , ip^T de- 
viating from GR, versus all of them having the functional 
dependence on masses as in GR. 

Using the logical disjointness of the i/iii2...ij. for differ- 



ent subsets i2, 
can write 



} as well as Bayes' theorem, one 



[Nt] QOiodGR. 
GR 



Nt 

E E 

k=l ii<i2<..-<ifc 



P{H,,^2...^^l) 

P(Hgr|I) 



^GR ' 



where 



Dll«2- 

^GR 



F(rf|g.,.,....,,l) 

P{d\'HGR.l) 



(34) 



(35) 



Again, one computes the 2^ — 1 individual Bayes fac- 
tors of each of the alternative hypotheses versus 
GR. The evaluation of the odds ratio requires that we 
use specific values for the prior odds ratios. We will set 



them equal to each other, as we did in Eq. (28) 

m..2....ji) 



p{n 



GR 



II) 



p{nGR\i) 



for any k, I < Nt, 

(36) 
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in which case the odds ratio (^T)QmodGR propor- 
tional to a straightforward average of the Bayes factors. 
We note once again that other choices could in principle 
be made. If one expects a violation to be mainly visible 
in a particular phase coefficient (see, e.g., Table I in j52|). 
then one might want to downweight the more inclusive 
hypotheses. Or, one may argue that a deviation will most 
likely affect all of the coefficients starting from some PN 
order, but then it would not necessarily be sensible to 
give more inclusive hypotheses a lower weight. Hence 
we assign equal prior odds to all of the sub-hypotheses 

Hij^i2...ik ■ 

Also as before, we let 



Pin 



modGR 



|I) 



= a, 



(37) 



where we do not specify a; it will end up being an overall 
prefactor in the odds ratio. The equality (37 1, together 
with (36 1 and the logical disjointness of the 2^^^ — 1 hy- 



potheses implies 

^(HgrJI) 



2Nt _ 1 • 



(38) 



In terms of the Hi-^i^,,,i^ , the odds ratio can then be writ- 
ten as 



Nt 

(AfT)r)modGR _ " \ ' 

^GR - - 1 

fc=l ii<i2<---<H 



B^i^^ (39) 



Analogously to (20|-(22), we will use priors on the pa- 
rameters 5xi 



(40) 



which are constant within some large box centered on the 
origin. 



D. Combining information from multiple sources 

Although the detection rate for binary neutron stars is 
still rather uncertain, we expect advanced instruments to 
detect several events per year [IJ] • It is therefore impor- 
tant to take advantage of multiple detections to provide 
tighter constraints on the validity of GR. Consider a set 
of N independent GW events, corresponding to N inde- 
pendent data sets cIa . We do not assume that deviations 
from GR are necessarily consistent between events, but 
rather that they can vary from source to source. We 
assume there is a common underlying theory of gravity 
that describes emission of gravitational waves from the 
sources that are observed, but that shifts in the values 
of the parameters can vary from one source to another, 
over and above the dependence of the parameters on the 
masses. One can write down a combined odds ratio for 



the catalog of sources: 

(A^Tj/omodGR 
'-^GR 



-P("HmodGRMl, ■ ■ - ^d^,!) 
P(HGRMl,...,rfAA,I) 



k=l Z^ii 



<i2<-<ik 



P{UGK\dl 
^ ^ P{Hiii-2 
k=l ii<i2< - <i 



,d^^,l) 

\^ (cat) -nilh-.-ik {at\ 

P(Hgr|I) ' ^ ^ 



where 



(cat) r)iii2---it _ P(dl, . . . , C^A^ | ffjii^ . . .j^ , I) , , 

""^K - P(di,...,dA,|HGR,I) • 

Since the events c?i, . . . , dj\f are all independent, one has 
P{di,...,du\H,,,„„,„l) = \{P{dA\H,,,„„,,,l), 

A=l 
J\f 

P{di,...,d^\HGK,l) = n ^('^^I^GR,!). (43) 



Thus, 



with 



AT 



(cat)^ili2---ifc _ TT (A) j^iii2---i 
GR X X GR 



A=l 



{A) r>ili2---ik 



P{dA\H,,,„„,,,l) 
P{dA\nGRj) ' 



(44) 



(45) 



To evaluate the combined odds ratio of the catalog we 
choose to invoke indifference (as in Eqs. (28) and (36)) 



and set the individual prior odds ratios equal to each 
other, so that 



P{H,,^2...^k\^) 

P(Hgr|I) 



2Nt _ 1 ■ 



(46) 



Together with Eqns. (|41|, (|44|, this leads to 

(Afr)/nmodGR 



^GR - 



fc=l ii<i2<...<ik A=l 



(47) 

which, up to an overall prefactor, amounts to taking the 
average of the cumulative Bayes factors (44). 



Alternatively, one may prefer not to make any assump- 
tions for the prior odds ratios P{Hi^i2,,,iJl) / PCHgr]^) 
at all, and focus on the cumulative Bayes factors 
(cat)^n«2 - ifc separately and individually. It will also be 
of interest to look at the cumulative Bayes factors for the 
various hypotheses Hi-^i2...ik and Hqr against the noise- 
only hypothesis 'Hnoisc: 

n(A)mii2...ifc ^ TT P{dA\Hi^i2...ik7^) 
^ noise n p^dA\nnoisc,l) ' 

Y\{A)^GR _ TT P{dA\nGR,l) , . 
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and we note that 



(cat) n«ii2- 



n 



(A) TDiii2---ik 
A noise 



n 



V noise 



(49) 



Finally, we will also look at the individual contributions 
^^B]^.-'" and for A = 1, 2, . . . , A^. 



We take the prior on 5xi to be flat and centered on 
zero, with a total width of 0.5. This will be much larger 
than the deviations we will use in simulated signals and 
hence suffices to illustrate the method; for real measure- 
ments one may want to choose a still wider prior. 

For the purposes of showing results, the factor a in 
Eqns. (39 1 and (47) will be set to one. 



IV. RESULTS 

To illustrate the method, we construct catalogs of 15 
binary neutron star sources, distributed uniformly in vol- 
ume, with random sky positions and orientations, and 
whose total number is taken to be on the conservative 
side of the 'realistic' estimates in [15] for the number 
of detectable sources in a one-year time span. We take 
the individual neutron star masses to lie between 1 and 
2 Mq. The distance interval is between 100 Mpc and 
400 Mpc; the former number is the radius within which 
one would expect ^0.5 BNS inspirals per year, and 400 
Mpc is the approximate horizon distance in Advanced 
LIGO. The corresponding signals are added coherently 
to stationary, Gaussian simulated data for the Advanced 
Virgo interferometer and the two Advanced LIGOs. A 
lower cut-off of 8 is implemented on the optimal net- 
work SNR, defined as the quadrature sum of individual 
detector SNRs. The analysis of the surviving signals is 
performed with an appropriately modified version of the 
nested sampling code available in LAL [50] . 

For simplicity, we took the phase up to only 2PN order 
(M = 4), both in the signal and in the model waveforms. 
As an example, we considered the case of Nt ~ 3 test- 
ing coefficients "01 j "02, ■03- Nested sampling then gives 
23 - 1 = 7 Bayes factors B^r, BI^, BI^, B},\, B},\, 
B^j^, and B^^, which for a single source can be combined 
to form the odds ratio ( 39 1 . Note that up to and includ- 



ing 2PN order, the post-Newtonian coefficients take the 
general form [23| 

UM,v) = Y^9^iv) i^M ^-3/5)(-5)/3, (50) 

where the gi{r]) are polynomials in 77; for the lowest three 
sub-leading FN orders one has 

/ N ^ / N 20 /743 11 \ 

(51) 

Accordingly, in the model waveforms we allow for defor- 
mations 



128r; 
3 

128ry 



92{v) i^M V-'^'r' [1 + 6x2] 



(52) 



A. Measurability of a deviation in a 
post-Newtonian coefficient 

In order to gauge the sensitivity of our method to de- 
viations from GR, we first consider a large number of 
simulated signals with a constant relative offset Sx3- If 
GR is violated, we do not expect the deviation to be 
this simple, but these examples will serve to illustrate 
both the workings and the effectiveness of the technique 
for low-SNR sources of the kind we expect in Advanced 
LIGO and Virgo. We note that ^3 is the lowest-order 
coefficient which incorporates the non-linearity of Gen- 
eral Relativity through so-called tail effects [201 H] , and 
is therefore of particular interest. 



1. Signals with constant relative deviation Sxs = 0.1 

We start with signals that have 6x3 = 0.1. We 
first compute the odds ratios for individual sources. 



(^^)Og°^^^, according to (39), with Nt = 3. Next we 



divide these up randomly into catalogs of 15 sources each 
and compute the combined odds ratios (^T)(^modGR 



(47). We do the same thing for injections that are pure 
GR, i.e. Sxi = for all i, and again compute the quanti- 
ties (^fT)(9modGR foj. individual sources, and (WT)0modGR 

for catalogs of 15 sources each. 

Before considering catalogs, let us look at the (log) 
odds ratios for individual sources as a function of SNR. 
This is shown in Fig. [4] The overwhelming majority of 
signals have SNR between 8 and 15, consistent with our 
SNR cut and the placement of sources uniformly in vol- 
ume up to 400 Mpc. Even for low SNR sources, there is 
separation between the GR injections and the injections 
with modified ■03- As one would expect, the separation 
becomes much clearer with increasing SNR. 

Fig. [5] shows normalized distributions of the log odds 
ratios, both for individual sources and for catalogs of 
15 sources each. Combining the odds ratios for sources 
within a catalog will strongly boost our confidence in a 
violation of GR if one is present at the given level. For 
this particular choice of Sx3 in the injected non-GR sig- 
nals, the separation from the GR injections is complete. 

It is useful to look at which of the Bayes fac- 
tors of the component hypotheses tend to give the 
largest contribution to the odds ratio. What is 
computed directly by the nested sampling code is 



not Bgr , . 



r3 r12 
' ^GR' ^GR' ■ 



'-^GR'^GR' b^t rather the 
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FIG. 4; The log odds ratios for individual sources. The blue 
crosses represent signals with standard GR waveforms, the 
red circles signals with a constant 10% relative offset in -03. 
A separation between the two is visible for SNR > 10 and 
becomes more pronounced as the SNR increases. 



Bayes factors for each of the hypotheses against the noise- 
only hypothesis 'Hnoise: 



and one has 
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In Fig. § we show the cumulative number of times that 
a particular B'^^^^l'' is the largest, against SNR, for the 
case where the injections have Sxa — 0.1. The results 
are entirely as expected, considering that the injected 
waveform has a shift in V'a only: 

• The Bayes factor B^^^^^ corresponding to the hy- 
pothesis dominates; 



• The Bayes factors BJ-^^^J" corresponding to hy- 
potheses that involve V'3 being non-GR tend to out- 
perform those that do not; 

• The Bayes factors for the non-GR hypotheses devi- 
ate from the GR one already at low SNR, showing 
that our method will perform well in the low-SNR 
scenario. 

Because of the first two points, one may be tempted to 
assign different prior odds to the various hypotheses in- 
stead of setting them all equal to each other. For instance 
one might consider downweighting the most inclusive hy- 
pothesis, i?i23, by invoking Occam's razor. However, the 
violation of GR we assume here is of a rather special 
form. In reality one will not know beforehand what the 
nature of the deviation will be; in particular, its effect 
may not be restricted to a single phase coefhcient. It is 
possible that all coefficients are affected, in which case 



FIG. 5: Top: The normalized distribution P(lnOGH'*°^) of 
log odds ratios for individual sources, where the injections 
are either GR or have Sxs = 0.1. Bottom: The normalized 
distribution P(ln Cqr*''^) of logs of the combined odds ratios 
for GR injections and injections with 5x3 = 0.1, for catalogs of 
15 sources each. The effectiveness of the catalog approach to 
testing for deviations from GR comes from the combination 
of multiple sources, each source contributing to the overall 
result in proportion to its own Bayes factors. 



one would not want to a priori deprecate i?i23. As ex- 
plained in Sec. |III C| our hypothesis HmodOR corresponds 
to the question whether one or more of the phasing coef- 
ficients {tpi, V'2, V's} differ from their GR values; one may 
want to ask a different question, but this is the one that 
is the most general within our framework. To retain full 
generality, all sub-hypotheses Hi^i^, ^i^ need to be taken 
into account and given equal weight. 



2. Signals with constant relative deviation &xs ~ 0.025 

It is clear that, if signals arriving at the Advanced 
Virgo-LIGO network would have a (constant) fractional 
deviation in 1/^3 as large as 10%, then at least under the 
assumption of Gaussian noise, we would have no trouble 
in discerning this violation of GR even if only 15 events 
were ever recorded. Now let us look at a smaller devia- 
tion in ■03; say, 2.5%. 
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FIG. 6: Top; curves and left vertical axis: For a given SNR, 
the cumulative number of times that the Bayes factor against 
noise for a particular component hypothesis is the largest for 
injections with that SNR or below, for 5^3 = 0.1. All 1471 
simulated sources were used. As expected, -Bnoiso dominates, 
and Bayes factors for hypotheses that let ips be non-GR tend 
to outperform those that do not. The GR model outperforms 
the one with the largest number of free parameters. His- 
togram and right vertical axis: The number of sources per 
SNR bin. 

Bottom: The same as above, but restricting to sources with 
SNR < 12. Similar behavior as for the full set of 1471 sources 
is observed. Note that already at SNR close to threshold, the 
GR hypothesis is more likely to be disfavored. 



In Fig. [7] we plot the log odds ratios for individual 
sources against SNR, both for signals with GR waveforms 
and signals with 6x3 = 0.025. This time the two distri- 
butions largely coincide, although there are some outliers 
which could boost the combined odds ratio when they are 
present in a catalog of sources. 

In Fig. [8] we show normalized distributions of the log 
odds ratio for individual sources, as well as for catalogs 
with 15 sources each. For individual sources, the distri- 
butions arc more or less on top of each other. The picture 
is somewhat different for the catalogs. If a catalog with 
6x3 = 0.025 happens to contain one of the outliers visible 
m Fig. [7j then it can boost the combined odds ratio for 
the catalog. 

It is instructive to look at a representative catalog with 
In (3)0niodGR > 0. In Fig. [9] we show the build-up of the 
log Bayes factors for the various sub-hypotheses against 
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FIG. 7: The log odds ratios for individual sources. The blue 
crosses are for signals with standard GR waveforms, the red 
circles for signals with a constant 2.5% relative offset in ^3. 
This time there is little separation between the two, although 
there are outliers for SNR > 15. 



GR, as well as the odds ratio itself. In a scenario where 
the evidence for a GR violation is marginal, it is imper- 
ative to include as many hypotheses as possible in the 
analysis. Indeed, in this example, the hypothesis is 
not the most favored one; instead, it is Hi. Note also that 
if we had only tested the most inclusive hypothesis H123 
against GR (as one might do if one expects a deviation 
in all of the FN parameters), we would have concluded 
that the GR hypothesis is the favored one. The same 
is true if we had only tested H2 , as one would do when 
specifically looking for a 'massive graviton'. Even the log 
Bayes factor for H23 ends up being negative. We also re- 
mind the reader that we will not know beforehand what 
the precise nature of the GR violation will be. 

By using the distribution of log odds ratios for simu- 
lated catalogs of GR sources, one can establish a threshold 
which the odds ratio of a given catalog must overcome 
in order that a violation of GR becomes credible. This 
would be the analog of what was done in [H7] (see Fig. 7 
of that paper), where the distribution of the log Bayes 
factor lni?g N for the presence of a signal versus noise- 
only was computed for many realizations of the noise, 
in the absence of a signal. Consider the distribution 

P (In (^^)0™°dGR|^^ y^^^^ J-) of Q(j(jg j,j^fjo ^j^g goi_ 

lection k of simulated catalogs of signals that are in ac- 
cordance with GR. Given a 'false alarm probability' /3, a 
threshold InO^ for the odds ratio can be set as follows: 



/3 = 



P{\iiO\K,nGR,l)dlnO. 



(55) 



in Ob 



Now suppose we also have a distribution 
p(ln(JVT)0modGR|^/ -^^j^j^ of log odds ratio for a 

collection k' of simulated catalogs of signals which follow 
some alternative theory (in this example, one which 
leads to a shift 6x3 = 0.025). Then we can quantify the 
chance that a deviation from GR of this particular kind 
will be detected with a false alarm probability smaller 
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FIG. 8: Top: The normalized distribution P(ln Ogj^'^^^) of 
log odds ratios for individual sources, where the injections 
are either GR or have = 0.025. Bottom: The normal- 
ized distribution P(ln C'gr'^^^) of logs of the combined odds 
ratios for GR injections and injections with = 0.025, for 
catalogs of 15 sources each. For individual sources, the two 
distributions essentially lie on top of each other. However, 
when sources are combined into catalogs, it is possible for an 
outlier to boost the odds ratio of the entire catalog. 



than the given /3, by means of an efficiency defined as 



/ P{lnO\K\n,n,l)d\nO. (56) 



We note that with these definitions, the efhciency is inde- 
pendent of the overall prior odds of "HmodGR versus Hqr, 
as the factor a in Eqns. (l37| and (47) will just cause 
both the distributions P (liT^^^Ogg^ |k, Hgr, l) and 
P (In '^^^'>0'^^^'^^\k','H^iu I) , and the threshold In 0^3, to 
be shifted by In a. 

In the present example, with Sxs = 0.025 and catalogs 
of 15 sources each, for /3 = 0.05 one has ( = 0.22. Hence, 
by this standard and for the given number of sources in 
a catalog, a GR violation of this kind is just borderline 
detectable. 

However, the number of sources per catalog, 15 in the 
examples shown so far, and the chosen false alarm prob- 
ability, P = 0.05, are somewhat arbitrary. In reality the 
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FIG. 9: The build-up of cumulative Bayes factors against GR 
for individual hypotheses, and the odds ratio, for a typical 
catalog with 5x3 = 0.025. Note that on the basis of the Bayes 
factor against GR of the most inclusive hypothesis ^123 alone, 
one would have concluded that the GR model is in fact the 
favored one. Even the log Bayes factor for H23 ends up being 
negative. Additionally, the hypothesis with the largest Bayes 
factor is not H3 but Hi. This illustrates that it is necessary 
to include as many hypotheses as possible in the analysis. 



size of the catalog will depend on the number of detected 
sources, and the false alarm probability is set according 
to the required confidence. It is therefore of interest to 
investigate the effects of both of these factors. In Fig. [lO] 
we show the behavior of the efficiency as a function of the 
catalog size and the false alarm probability. To account 
for the arbitrariness in which the sources are combined to 
form a catalog, we show the median and 68% confidence 
interval of the efficiencies from 5000 random source order- 
ings as the central curves and the error bars, respectively. 
Results are shown for /3 e {0.05, 0.01}. 

As is evident from Fig.jlOj the efficiency rises as a func- 
tion of the catalog size. This highlights the importance of 
combining all available sources in the advanced detector 
era, when one looks for deviations from GR. The maxi- 
mum catalog sizes shown are comparable to the 'realistic' 
estimates of the number of detections of binary neutron 
star inspirals in a the span of a year |15| . 

We see that Sx3 = 0.025 is a borderline case in terms of 
discernability of a GR violation. Later on, when we show 
posterior PDFs, it will become evident that indeed, Sxs 
can typically be measured with an an accuracy of this 
order. 



B. Effect of number of hypotheses used 

It is of interest to see what would have happened if 
we had used a smaller number of testing coefficients; say, 
{V'li ^2}, so that the hypotheses to be tested are Hi, H2, 
and Hi2. In the example with Sxs — 0.025 as in the 
previous subsection, the PN order where the deviation 
occurs, namely 1.5PN, would then be higher than the 
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FIG. 10: The efficiency of detecting a GR violation for sources 
with 5x3 = 0.025, as a function of catalog size for false alarm 
probabilities /3 £ {0.05, 0.01}. The median and the 68% confi- 
dence interval from 5000 random catalog orderings are shown 
as the central curve and the error bars, respectively. The 
efficiency increases as a function of catalog size, once again 
underscoring the benefit of combining all available data. 



PN orders associated with our testing coefficients, which 
are 0.5PN and IPN. 

In Fig. [TT] the following two things are shown: 

• In the case where only {■01, '02} are testing coeffi- 
cients, we compute the thresholds In^^'^Op corre- 
sponding to false alarm probabilities (FAPs) (3 g 
{0.05,0.01}. Next, we re-calculate the false alarm 
probabilities for the same thresholds, but now for 
the case where there are three testing parameters, 
{V'l, V'2, "03}, and show the difference in false alarm 
probabilities; 

• On the other hand, one can compare the efficiencies 
^^^C and (■^^C for the two and three parameter cases, 
for fixed false alarm probabilities (3 e {0.05,0.01}. 
This is shown in the bottom panel. 

As expected, in the first case (fixed thresholds for the 
odds ratios), the false alarm probabilities increase in go- 
ing from two to three testing parameters, but only mod- 
erately so. On the other hand, for fixed false alarm prob- 
abilities, there is no appreciable change in efficiency. In- 
deed, the spread in the GR 'background' will increase 
with an increase in hypotheses to be tested against GR; 
yet, having more hypotheses does not really hurt us in 
terms of our ability to detect a deviation from GR. 

Fig. [TT] indicates the typical behavior for catalogs with 
a specific deviation from GR, in this case 6x3 = 0.025. It 
is worth repeating, however, that especially when there 
is only marginal evidence for a GR violation, it is im- 
portant to use as many hypotheses as is computationally 
feasible; see Fig. |9 (and also Fig. 16 below). Also, we 
will obviously not know beforehand what the nature of 
the GR violation is. 



One may nevertheless wonder how our 3-parameter 
test would compare with a 'targeted search' that only 
looks for a deviation in , which in this example happens 
to be where the deviation actually is. With our choice of 
a = 1, this corresponds to setting O^g^^^ = ^''''*^-Bgr- 
Fig. [12] shows the change in false alarm probabilities in 
going from testing only Hr^ to the full test for fixed log 
odds ratio thresholds, as well as the change in efficien- 
cies for fixed false alarm probabilities. The results are as 
follows: 

• The change in false alarm probabilities for fixed log 
odds ratio thresholds is minor; 

• However, especially for a large number of sources 
per catalog, the efficiencies show a clear rise. This 
can be accounted for by the fact that, for a small 
violation of GR, it will not always be the case that 
the Bayes factor against GR for H3 is the largest, 
but our method is able to compensate for that. 

We conclude that for this particular example, our method 
with Nt = 3 testing parameters will tend to outperform 
a 'targeted search' that happens to look for the violation 
actually present. However, we do not expect this to be 
true for more complicated deviations from GR. 



C. Measurability of deviations with non-PN 
frequency dependences 

The aim of the previous subsections was to get a rough 
idea of the sensitivity of our method to deviations in post- 
Newtonian coefficients, and in order to gauge this we as- 
sumed a constant relative offset in the physically interest- 
ing parameter 03. However, we stress once again that we 
do not expect a violation of GR to manifest itself as a sim- 
ple constant relative shift in one of the post-Newtonian 
coefficients. Even if modifications are confined to the 
PN coefficients, the 6xi in the signals can be dependent 
on (M, rj), in addition to whatever charges and coupling 
constants may be present. Moreover, a deviation from 
GR could introduce terms in the phase with frequency 
dependences that do not correspond to any of the PN 
contributions. We now show that the method can also 
be sensitive to violations of that kind, even though the 
model waveforms we use in our analyses only have defor- 
mations of PN terms. Let us give an heuristic example 
where the phase of the simulated signals contains a term 
with an anomalous frequency dependence in between that 
of the IPN and 1.5PN contributions. Specifically, 
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and we note that the IPN term goes like and the 
1.5PN term like f~'^l'^\ thus, the deviation introduced 
here could be dubbed '1.25PN'. However, for the recov- 
ery, we will continue to use the same model waveforms as 
before, which can only have shifts in the phase coefficients 
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FIG. 11: Top: The change in false alarm probabilities (FAPs) 
in going from two to three testing parameters, but keeping 
the odds ratio thresholds fixed. Bottom: The change in ef- 
ficiencies when keeping the false alarm probabilities fixed. 
The plots shown are for the case where the signals have 
^X3 = 0.025. We see that increasing the number of testing 
parameters has only a moderate effect on the FAPs, while for 
fixed FAPs, the efficiencies do no change appreciably. Note, 
however, that when the evidence for a deviation from GR is 
marginal, the use of as many hypotheses as possible can be 
pivotal in finding the violation (see Fig. [o] and also Fig. 16 1. 



at 0.5PN, IPN, and 1.5PN. Our aim in this subsection 
is to show that they will nevertheless allow us to find a 



deviation in the signal of the form (57 1. 

We now need to make a choice for SxA- We aim to 
show that even if there is a deviation in the phase that 
is not represented in any of our model waveforms, it can 
be recovered, if near the 'bucket' of the noise curve (at 
/ ^ 150 Hz) the amount by which it affects the phase is 
on a par with a shift in the PN coefficients of more than 
a few percent. 

For definiteness, let us take Sxa to be constant, and 
such that at / = 150 Hz and for a system of (1.5, 1.5) Mq, 
the contribution ( 57 ) to the phase is equal to the change 
caused by a shift in the 1.5PN contribution with Sxs = 
0.1: 

(ttS M0)~5/'^(5xa(15O Hz)-^/*' 

= 53(0.25) (7r3Af0)-2/3 x 0.1 x (150 Hz)-2/3^(58) 
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FIG. 12: Top: The difference of false alarm probabilities, 
(3)^ — (target)^^ j-^^. g^^^^j ^Qg odds ratio thresholds and signals 
having Sxa ~ 0.025, between our 3-parameter test and a 'tar- 
geted search' which only looks for a deviation in ■03, i.e., only 
tests the hypothesis H3 against GR. We see that the differ- 
ence is minor. Bottom: More important is the difference in 
efficiencies, '"^■'(^ — for fixed false alarm probabilities. 

Especially for a large number of sources per catalog, our 3- 
parameter test is actually more efficient than the 'targeted 
search', at least for this particular example. This is because 
the Bayes factor against GR for H3 will not be the largest 
in every catalog, but our method naturally compensates for 
that. Of course, we do not expect our method to outperform 
a targeted search in the case of a more complicated deviation 
from GR. 



leading to Sxa — —2.2. 

As before, we first give results for the odds ratios of 
individual sources with increasing SNR; see Fig. [13] We 
see that even at small SNR there is already a good sepa- 
ration between the GR injections and the injections with 
a modification in the structure of the phase. 

Next we show normalized distributions of the log odds 
ratios, both for individual sources and for catalogs of 15 
sources each: Fig. 14 



As expected, for the catalogs there 
is an excellent separation between the GR injections and 
injections with a modification in the phase. 

Now let us look at the cumulative number of times that 
the Bayes factor against noise for a particular hypothesis 
is the largest, for individual sources with Sxa = —2.2, 
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FIG. 13: The log odds ratios for individual sources for injec- 
tions with an anomalous frequency dependence in the phase. 
The blue crosses represent GR injections, while the red circles 
are for signals that have a contribution to the phase with a 
frequency dependence in between that of the IPN and 1.5PN 
terms ('1.25PN'). As with the 5x3 = 0-1 example, a separa- 
tion between the two is visible for SNR > 10 and becomes 
more pronounced as the SNR increases. 



arranged with increasing SNR (Fig. 15). From SNR > 
15, the Bayes factor B^^^^^ starts to dominate, followed 
by Snoiso and i?noiscJ ■^itl^ tlis latter two crossing over 
between SNR - 20 and SNR - 25. Aheady at SNR - 9, 
all of the B^oSc '" dominate the Bayes factor B^^^^ for 
the GR hypothesis. However, near the SNR threshold, 
no single hypothesis dominates clearly, which again shows 
that as many hypotheses as possible should be included 
in the analysis. 

Especially in this case, it is interesting to look at the 
growth of cumulative Bayes factors against GR for indi- 
vidual hypotheses, as well as of the odds ratio, as sources 
with increasing SNR are being added within catalogs of 
15 sources. This is shown for a few example catalogs in 



Fig. 16 The salient features are: 



• Even if all 15 sources only have modest SNR, by 
their cumulative contributions they can cause a rel- 
atively large odds ratio for the catalog as a whole; 

• In catalogs containing a source with a particularly 
hight SNR, it is by no means a given that the con- 
tribution of this source will dominate the odds ra- 
tio compared to the cumulative contributions of the 
other sources; 

• Which hypothesis comes out on top will vary from 
one catalog to another; in the examples of Fig. [16] 
we see H12, H2, or H23 giving the largest contri- 
bution, respectively, but there are examples where 
any of the other four sub-hypotheses contributes 
the most. In this respect we note that the odds 
ratio for a catalog is proportional to the average 
of the cumulative Bayes factors themselves, not of 
their logarithms. If one were to a priori favor par- 
ticular (subsets of) hypotheses, the log odds ratio 
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FIG. 14: Top: The normalized distribution P(lnOCT'*°^) 



of 



log odds ratios for individual sources, where the injections 
are either GR or have the anomalous frequency dependence. 
Bottom: The normalized distribution P(ln C»gr'*°^) of logs 
of the combined odds ratios for GR injections and injections 
for catalogs of 15 sources each. 



could be lowered by as much as 100. This could 
have a large effect on the false alarm probability; 
see Fig. [14] These are again arguments for using 
as many sub-hypotheses as possible, and give them 
equal relative prior odds. 

Note that in principle we could have extended our 
model waveforms with more free parameters so as to be 
more sensitive to deviations of this kind, e.g. by includ- 
ing a term in the phase of the form Af"', similar to what 
one has in the so-called parameterized post-Einsteinian 
(ppE) waveforms [50]. However, the point of our method 
is to search for generic deviations from GR. In the future, 
we will want to search with more sophisticated (time do- 
main) waveforms whose use is computationally more de- 
manding in a Bayesian setting, and we will not be able to 
allow for an arbitrarily large number of deformations in 
the model waveforms. However, here we have an exam- 
ple where the injections have a contribution to the phase 
that is not present in any of the model waveforms, yet 
is clearly observable. Recall that the overall magnitude 
of the anomalous phase contribution was chosen so that. 
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FIG. 15: Top; curves and left vertical axis: The cumulative 
number of times that the Bayes factor against noise for a par- 
ticular component hypothesis is the largest, with increasing 
SNR, for individual sources with 5xa ~ —2.2. AH 854 sim- 
ulated sources were used. Histogram and right vertical axis: 
The number of sources per SNR bin. 

Bottom: The same as above, but now for sources with SNR 
< 12. Close to threshold, no single hypothesis is the dominant 
one. 



at / ^ 150 Hz, it changes the phase by a similar amount 
as a shift in tp^ of 10%. Our results make it plausible 
that generically, when there is a deviation in the phase 
which, at frequencies where the detectors are the most 
sensitive, causes a phase change on a par with a change 
in one of the (low order) PN coefficients of more than a 
few percent, it will be detectable. 



D. Parameter estimation 

Finally, let us look at some posterior PDFs for the 5xi- 
We stress that unlike Bayes factors, the PDFs can not be 
combined across sources since we should not expect the 
Sxi to be independent of the component masses; they can 
differ from source to source. Even looking at the PDFs 
for a single source may then be misleading: even if the 
deviation is exactly in one or more of the PN coefficients, 
a given source will have values for the Sxi that are repre- 
sentative just for the {A4,ri) of that source, and possibly 
also the values of additional charges that may appear in 
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FIG. 16: A few examples of how cumulative Bayes factors 
against GR for individual hypotheses, and the odds ratio, 
grow as sources with increasing SNR are being added within 
three different catalogs of 15 sources in total. Top and middle: 
Catalogs where sources have only modest SNRs (< 20). Note 
the large differences in contributions from different hypothe- 
ses, and in the ordering of Bayes factors, between these two 
catalogs. Bottom: A catalog with a high SNR source; note 
however that the source with the highest SNR does not cause 
a particularly large 'boost', and the cumulative log odds ratio 
ends up being considerably lower than in the top and middle 
examples. 



an alternative theory of gravity. In a given catalog, there 
may be only one source with sufficient SNR to allow for 
accurate parameter estimation, in which case the posteri- 
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ors will not tell us much even if they are strongly peaked. 
More generally, the deviation from GR may manifest it- 
self by the appearance of terms in the phase that do not 
have the frequency dependence of any of the PN contri- 
butions. However, in the event that the log odds ratio 
and Bayes factors strongly favor the GR hypothesis, pos- 
terior PDFs will allow us to constrain deviations in the 
PN coefficients, thereby adding further support that GR 
is the correct theory. Hence we start with an analysis of 
pure GR injections. 



1. Example: A GR injection 

Let us first look at a GR source with {M,r],D) = 
(1.31 M©, 0.243, 131 Mpc), and a LIGO- Virgo network 
SNR of 23.0. The Bayes factors for the various com- 
ponent hypotheses are: 



-2, 

-3, InB'Si. 



-2, 

-1, 
-2. 



lni?a'R = 



-2, 
-1, 



(59) 



We recall that these are the Bayes factors for a particular 
deviation from GR versus GR. The GR hypothesis is fa- 
vored in all cases. We can also look at the Bayes factors 
for all of the hypotheses against noise: 
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(60) 



Hence the signal is picked up very well by the waveforms 
of all of the hypotheses, with the GR waveform doing 
slightly better. 

Let us now look at some posterior PDFs. In Fig. [iTj 
we show the PDFs for Sxi, Sx2, and Sxa, respectively 
for the waveforms that have free parameters {9^6xi}, 
{9,6x2}, and {0,6x3}, with 9 the parameters of the GR 
waveform. We see that the distributions are all narrowly 
peaked around the correct value of zero. 



2. Example: A signal with 5^3 = 0.1. 

We now consider an example with [Ai, ri,D) = 
(1.18 Mq, 0.244, 196 Mpc), with a non-zero relative shift 
in ^■i of 6x3 = 0.1, and network SNR 23.2. The Bayes 
factors are: 
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FIG. 17: Posterior PDFs for a single GR injection with net- 
work SNR of 23.0. Top: 5xi measured with a waveform that 
has {9, 5xi} as free parameters; middle: measured with a 
waveform that has {9,&X2} free; bottom: 5x3 measured with 
a waveform that has {^,5x3} free. In each case the distribu- 
tion is tightly centered on zero, with standard deviations of 
0.014, 0.015, and 0.019, respectively. 



This time the GR hypothesis is very much disfavored. 
However, we note that the Bayes factor for the hypothesis 
that only -03 differs from its GR value is not the largest. 
In fact, all the Bayes factors except for Bq^ and B}?^ 
are rather similar in magnitude, and no clear conclusions 
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can be drawn from them regarding the underlying nature 
of the deviation from GR. 

When looking at the Bayes factors against noise, we 
see that the signal is clearly detected for all hypotheses: 
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Now let us consider posterior PDFs. We expect the 
PDF of (5x3 for the hypothesis H^, where only {6,6x3} 
are allowed to vary, to be peaked at the injected value 
of 0.1, and this is the case with very good accuracy, as 
shown in the bottom panel of Fig. [18] 

In the upper and middle panels of the same figure, the 
PDFs of Sxi for the hypothesis Hi, and of 6x2 for the 
hypothesis H2 are shown. In these cases the parameter 
in the signal that has the shift is now not represented; in 
the first case only Sxi is allowed to vary on top of the 
parameters 9 of GR, and in the second case only 6x2 ■ In 
the nested sampling process, the waveform will still try to 
adapt itself to the deformation in the signal. The result is 
that Sxi and 6x2 are strongly peaked, but away from the 
correct values Sxi — 6x2 — 0. Thus, if one were to study 
the data only using waveforms from a specific alternative 
theory of gravity {e.g. a 'massive graviton' model with a 
deviation in 7/12 only), one might find a violation of GR 
but draw the wrong conclusions about the nature of the 
deviation. 

We can also look at the PDF for the hypothesis H123, 
where the waveforms have Sxi, 6x2, 6x3 free; see Fig. [19] 
Once again the peak is more or less at the correct value 
of 6x3 1 but we now have a much bigger spread. This too 
is as expected; parameter estimation degrades if one tries 
to measure too many parameters at once. 

Finally, we look at the two-dimensional PDF for 
{6x2,6x3}, in the case where the waveform is the one 
that tests the hypothesis H123', Fig. [20j Here too there 
is little to learn about the underlying nature of the devi- 
ation. 



3. Example: A signal with 5x3 = 0.025 

Let us consider an example with {M,rj,D) = 
(1.14M0, 0.242, 216Mpc), 6x3 = 0.025, and a network 
SNR of 20.6. As expected, the Bayes factors for the com- 
ponent hypotheses against GR are considerably smaller 
than in the case of 6x3 = 0.1, but GR is still disfavored: 
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FIG. 18: Posterior PDFs for a single injection with 5x3 = 0.1 
and network SNR 23.2. Top: 5xi measured with a waveform 
that has {6, Sxi} free; middle: 6x2 measured with a waveform 
that has {6,6x2} free; bottom: Sxa measured with a wave- 
form that has {0,5x3} free. As expected, the bottom PDF 
is sharply peaked at the correct value of Sx3 ~ 0.1, with a 
standard deviation of 0.012. For the top and middle PDFs, 
the one test parameter that is used differs from the parameter 
in the signal that has the shift. The parameters in the wave- 
form will rearrange themselves such as to best accommodate 
the properties of the signal. Both Sxi and 6x2 end up being 
sharply peaked, but not at the correct value of zero. 
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FIG. 19: The posterior PDF for the injection with Sxs = 0.1 
as in the previous figure, but recovered with waveforms where 
{0,Sxi,Sx2,5x3} are all free. The peak is near the correct 
value of Sxs (with a median of 0.083), but this time the spread 
is considerably larger (with a standard deviation of 0.055), as 
we are trying to measure more parameters at the same time. 



one should not expect the same to happen for lower-SNR 
sources. 

Let us also look at the PDF for Sxi when {6, Sxi} are 
free parameters, and of 6x2 when {6, 6x2} are free; see 
the top and middle plots of Fig. [21] As before, 5xi and 
6x2 are not peaked at the right values of 6x1 = 6x2 — 0. 



4. Example: A signal with non-PN frequency dependence in 
the phasing 

We now look at a signal with a non-standard contribu- 
tion to the phase, with a frequency dependence between 
IPN and 1.5PN, as in Eq. (57|. In the example we use 
here, {M,ri, D) 



(1.29 Af©, 0.250, 208 Mpc), with a net- 
work SNR of 22.4. The Bayes factors for the component 
hypotheses against GR are: 



lnS^R=91, InB^j, 



93, lnB^R = 89, 
91, lni?23^ = 92, 
91. 



(65) 
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FIG. 20: The 68% and 95% confidence contours of the two- 
dimensional PDF for {Sx2,Sx3}, still for an injection with 
5x3 = 0.1, and with a waveform that has {6, Sxi, Sx2, Sxa} as 
free parameters. The maximum of the PDF is given by the 
black dot and the injection values are represented by the red 
cross. 



Also as expected, the signal is easily found by all of the 
model waveforms: 
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As before we look at the posterior PDF of 6x3 for the 
hypothesis H3, where only {0,6x3} are allowed to vary: 
see the bottom plot in Fig. [21] The distribution is peaked 
near the correct value and stays away from zero; however, 



Thus, also in this case the GR hypothesis is very much 
disfavored, despite the fact that none of our model 
waveforms contain the anomalous frequency dependence 
which is present in the phase of the signal. We can also 
look at the Bayes factors against noise: 
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(66) 



It is interesting to look at the posterior PDF of 6x3 
for the case where {9, 6x3} are allowed to vary (bottom 
panel of Fig. 22 ) . The distribution looks uncannily like 
the analogous one for a signal with 6x3 = 0.1; see the 
bottom panel of Fig. [l8j We can also look at the PDF of 
6x1 in the case where ^Xi} are free parameters, and 
the PDF of 6x2 when {6, 6x2} are free; see the top and 
middle panels of Fig. [22| Here too there is an interesting 
resemblance to the analogous panels in Fig. [TSj for an 
injection with 5x3 = 0.1. 

In summary, 

• Also here, the Bayes factors against GR clearly dis- 
favor the GR hypothesis, despite the fact that none 
of our model waveforms has the kind of non-PN 
contribution to the phase that the signal contains; 

• As before, the Bayes factors against GR are quite 
close to each other, and one cannot conclude much 
from them about the nature of the underlying de- 
viation from GR; 

• The Bayes factors against noise indicate that the 
signal will not be missed; 
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FIG. 21: The posterior PDFs for ^xi (top), (middle), 
and 5x3 (bottom) for a single injection with 5x3 = 0.025 and 
network SNR 20.6, recovered with waveforms where, respec- 
tively, {6l,5xi}; {^i^Xa}, and {6,5x3} are free. Again 5x3 is 
peaked at close to the correct value, with a median of 0.027 
and a standard deviation of 0.0092, but both 5xi and 5x2 are 
strongly peaked at incorrect values. 



• The posteriors are quite similar to the ones where 
the deviation from GR is purely in the 1.5PN coef- 
ficient, with (5x3 = 0.1. 

Finally, let us look at the two-dimensional PDF for 
{(5X2, (^Xs} the case where the waveform is the one 



FIG. 22: The posterior PDFs for 5xi (top), 5x2 (middle), and 
5x3 (bottom) for a single injection with 5xa = —2.2 and net- 
work SNR 22.4, recovered with waveforms where, respectively, 
{S,5xi}, {6,5x2}, and {6,5x3} are free. The distribution of 
5x3 has its median at 0.11 and a standard deviation of 0.017. 
Note the remarkable resemblance with Fig. |18[ where the sig- 
nal had 5x3 = 0.1. Also for 5xi and 5x2, the distributions 
are very similar to the ones for a signal with 5x3 =0.1. 



tliat tests i?i23; Fig. 23 Unlike the one-dimensional 
PDFs, here there is not much resemblance with the two- 
dimensional PDF for (5x3 = 0.1 (Fig. 20). Still, nothing 



much can be learned about the actual nature of the vio- 
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lation. 
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FIG. 23: The 68% and 95% confidence contours of the two- 
dimensional PDF for {5x2, Sxa}, for an injection with Sxa = 
—2.2, and with a waveform that has {6, Sxi, 5x2, Sxs} as free 
parameters. The maximum of the PDF is given by the black 
dot. Here the d istr ibution is quite different from the case with 
5X3 = 0.1 (Fig. [20 1. 



This shows once again that we will be able to also 
discern violations of GR of a kind that has no analog in 
the model waveforms. In the posteriors, these can even 
'masquerade' as deviations in one of the post-Newtonian 
coefficients, if, for example, one would only be looking 
for a 'massive graviton' with a waveform model that has 
{9,6x2} as free parameters. 



E. A note on parameter estimation and multiple 
sources 

We end this section with some cautionary remarks on 
the use of parameter estimation in testing GR. As we 
have seen in subsection |IIID1 model selection allows for 
combining the information from multiple sources to com- 
pute a single odds ratio, but for parameter estimation the 
situation is quite different. 

In the examples presented in this paper, the deviations 
6x3 and 6xA were taken to be the same for all sources. 
Hence, in principle we could have combined PDFs from 
multiple sources, simply by multiplying them, to arrive 
at more accurate measurements. However, we reiterate 
that in reality, one cannot expect the deviations to be 
constant in this fashion; rather, they might vary from 
source to source, with dependence on the masses as well 
as whatever additional charges may be present in the cor- 
rect theory of gravity. If it so happens that deviations in 
individual sources can go both ways, making positive or 
negative contributions to the overall phase depending on 
the parameters of the source, then a combined PDF may 
not show any significant deviation at all. Moreover, since 
most sources will have SNRs near threshold, it could be 
that only a few sources will allow for accurate parameter 



estimation; here we only showed PDFs for single, rela- 
tively 'loud' sources with SNR > 20. However, if there is 
significant dependence of the GR deviation on source pa- 
rameters (unlike in the heuristic examples shown here), 
the particular deviation exhibited by the loudest source 
may not be representative. Hence, in contrast to model 
selection, parameter estimation alone does not provide a 
solid foundation to look for deviations from GR. 



V. CONCLUSIONS AND FUTURE 
DIRECTIONS 

As we showed at the beginning, it is possible for a 
signal to contain very significant deviations from General 
Relativity while still being detectable with a template 
family of GR waveforms; this is the 'fundamental bias' 
discussed in [321 [SUl . A violation of GR can in principle 
take any form, and the question which then presents itself 
is how to search for generic deviations. 

We have developed a general method to search for de- 
viations from General Relativity using signals from com- 
pact binary coalescence events. To this end we con- 
structed an odds ratio Oq^^*^^ for modifications to GR 
against GR, which is the posterior probability that there 
is a deviation from GR, versus GR being correct. This 
odds ratio can be written as a linear combination of Bayes 
factors Bq^'"^'' for hypotheses Hi^i^,,,i^, in each of which 
one or more of the phase parameters i/'i is assumed to de- 
viate from the GR value, without actually assuming any 
specific dependence on the frequency and/or physical pa- 
rameters pertinent to a given theory. Since this includes 
hypotheses where only a single one of the is non-GR, 
our method will be particularly well-suited in low-SNR 
scenarios, which we expect to be in with the upcoming 
advanced detector network. Finally, information from 
multiple sources can easily be combined to arrive at an 
odds ratio CQg*^^^ for the 'catalog' of all observed events. 

The method we developed, applied to phase coefficients 
in inspiral waveforms, essentially addresses the question 
"Do one or more of the coefficients differ from the val- 
ues predicted by GR?" This is in contrast with previous 
Bayesian analyses such as [351 IHl [S3] , where effectively, 
the question being asked was limited to "Do all of the 
additional free parameters introduced differ from their 
GR values?" In addition to being better adapted to a 
low-SNR environment, the test proposed here is far more 
general. 

In order to gauge how large a deviation might be de- 
tectable, we first considered signals with a constant frac- 
tional deviation in the 1.5PN coefficient ip^. This coef- 
ficient is of particular interest, since it incorporates the 
so-called 'tail effects' JO, .21^ (as well as spin-orbit cou- 
pling |22| , although we did not consider spin here) , which 
are not empirically accessible with binary pulsar observa- 
tions and can only be studied through direct detection of 
gravitational waves. When considering catalogs of only 
15 binary neutron star sources, we saw that a deviation 
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in ips at the 10% level would easily be detectable. In fact, 
even a deviation at the few percent level can be discern- 
able. This is confirmed by posterior PDFs for -03 in the 
case where this is the only parameter that is assumed to 
deviate from its GR value. 

We also considered a deviation in the phase with a 
frequency dependence that does not match any of the 
post-Newtonian terms, and hence is not present in any 
of the recovery waveforms that we used. More precisely, 
we looked at signals whose phase has an additional con- 
tribution, with a frequency dependence in between that 
of the IPN and 1.5PN terms ('1.25PN'). The magnitude 
of the deviation was chosen such that near / ~ 150 Hz, 
where the detectors are the most sensitive, the change in 
phase is roughly the same as the change caused by a 10% 
shift in the 1.5PN coefficient. The deviation was clearly 
detectable in the log odds ratios, the Bayes factors, and 
the posterior PDFs. We expect this to be an instance of 
a more general fact. Namely, even if there is a deviation 
in the phase which the model waveforms technically do 
not allow for, it will typically be observable, on condition 
that near the 'bucket' it causes a change in the phase that 
is on a par with the effect of a shift in the (low order) 
PN phase coefficients of more than a few percent. 

In order to establish the basic validity and usefulness 
of our method for testing GR, we considered constant 
fractional deviations in ■03, and a non-PN frequency de- 
pendence in the signal. However, even if there is a devi- 
ation in one or more of the PN phase contributions only, 
these may depend on {M , 77) as well as whatever addi- 
tional charges and coupling constant might be present. 
It would be of great interest to study the effects of more 
general deviations from GR on the odds ratio OqJ^'^'^^, 
the cumulative Bayes factors Y[a '-^'> Bq^'"^'' for the com- 
ponent hypotheses Hi-^i^,,,i^, againts Hgr, and the cumu- 
lative Bayes factors Y[a ''^''-^noSe against noise. Addi- 
tionally, in future one should consider deviations in the 
amplitude as well, and use model waveforms which have 
such freedom. A priori there is no reason not to use an 
arbitrary number of free coefficients in both the phase 
and the amplitude of recovery waveforms, the only lim- 
iting factor being computational time. 

Should no deviation from GR be convincingly found 
through model selection, one will still be interested in 
constraining the theory. Advanced gravitational wave 
detectors will then take us considerably beyond the bi- 
nary pulsar observations. We recall that even the IPN 
phase coefficient -02 is not fully constrained by the latter, 
since the dissipative dynamics is only probed to leading 
PN order. As we have seen from PDFs, with a single com- 
pact binary coalescence event in Advanced LIGO/ Virgo 
at SNR ~ 20, the coefficient -02 can be constrained to 
better than 2%, and the same is true of the 1.5PN coef- 
ficient 03. Mainly for computational reasons, we did not 
study constraints on the 2PN and higher-order terms, but 
it would clearly be of interest to see how well one can pin 
down the corresponding coefficients. We also draw at- 
tention to the results for the 0.5PN contribution, which 



in General Relativity is identically zero. For simplicity 
we restricted the mass range of our simulated sources to 
that of binary neutron stars. Given the very encouraging 
results, in future work the BHNS and BBH ranges should 
also be investigated. 

The preliminary results presented here motivate the 
construction of a full data analysis pipeline for testing or 
constraining General Relativity. To have a real chance of 
finding a deviation, much more sophisticated GR wave- 
forms will have to be used, with inclusion of merger and 
ringdown, higher harmonics both in the inspiral and ring- 
down parts, dynamical spins, and residual eccentricity. 
The development of such waveforms, with input from 
numerical simulations, is currently a subject of intense 
investigations jlH [72H8T] . We note that the method we 
presented is not tied to any particular waveform approx- 
imant. Moreover, the deformations need not be in the 
phase. Indeed, in the future one would presumably want 
to use time domain waveforms, for which it may be more 
convenient (and physically more appropriate) to intro- 
duce parameterized deformations directly in coefficients 
appearing in, e.g., a Hamiltonian used to evolve the in- 
spiral part of the waveform. Irrespective of the parame- 
terization, one would still be able to associate with it an 
exhaustive set of logically disjoint hypotheses iJijij - H - 

Once sufficiently accurate waveforms are available, a 
test of GR on Advanced LIGO/ Virgo could go as follows: 

• Starting from the best available GR waveforms, 
introduce parameterized deformations, leading to 
disjoint hypotheses like our -ffiiij - i/c ; which to- 
gether form "HmodGR; 

• Use many injections of GR waveforms in real or 
realistic data, arranged into simulated 'catalogs', 
to investigate the distributions of the cumulative 
odds ratio Oq^'*^^ as well as of the cumulative 
Bayes factors when GR is correct. Use these to set 
thresholds for the measured odds ratio and Bayes 
factors to overcome; 

• Apply our method to the catalog of sources actually 
found by the detectors. If the measured cumulative 
odds ratio Cqr^^^ is below threshold, then there is 
no real reason to believe that a deviation from GR 
is present. The posterior PDFs for the free phase 
and amplitude coefficients in the model waveforms, 
taken from the highest-SNR sources, will provide 
(potentially very strong) constraints on these pa- 
rameters; 

• If the measured odds ratio is above threshold, then 
a violation of GR is likely. As we have seen, Bayes 
factors and PDFs can be misleading in trying to 
find out what the precise nature of the deviation 
may be. However, one may be able to follow up 
on the violation by again using our method, this 
time with waveforms with more complicated defor- 
mations and a larger number of free parameters. 
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inspired by particular alternative theories of grav- 
ity, similar to what is done in ppE [501 . 

Thus, although much work remains to be done, we 
have the basics of a very general method for testing Gen- 
eral Relativity using compact binary coalescence events 
to be detected by the upcoming Advanced LIGO and 
Advanced Virgo observatories. 
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